-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-5625][VL] Support window range frame #5626
[GLUTEN-5625][VL] Support window range frame #5626
Conversation
Run Gluten Clickhouse CI |
5 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
f5d363b
to
0484902
Compare
Run Gluten Clickhouse CI |
@PHILO-HE is the PR offload SQL like below?
|
Let me do a test once merged. Thank you! |
@WangGuangxin can you help to resolve conflicts? |
0484902
to
7d570fb
Compare
Run Gluten Clickhouse CI |
@FelixYBW done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will spare some time to review in next week. Sorry for late response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just added several comments.
return std::make_tuple( | ||
core::WindowNode::BoundType::kFollowing, | ||
exprConverter_->toVeloxExpr(boundType.following().ref(), inputType)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the only difference of the two branches is the expression. Can we differentiate the expressions only and reuse the rest of code?
return std::make_tuple( | ||
core::WindowNode::BoundType::kPreceding, | ||
exprConverter_->toVeloxExpr(boundType.preceding().ref(), inputType)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
I tried to reuse the logic here, but since the type of boundType.preceding()
and boundType.following()
are different, so I have to pass the offset
and ref
instead of the expression itself
String upperBound, | ||
String lowerBound, | ||
org.apache.spark.sql.catalyst.expressions.Expression upperBound, | ||
org.apache.spark.sql.catalyst.expressions.Expression lowerBound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we import org.apache.spark.sql.catalyst.expressions.Expression
and use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Expression
is conflict with io.substrait.proto.Expression
which alreay imported in this class, and Java doen't have the alias grammer like scala
if (offset < 0) { | ||
Expression.WindowFunction.Bound.Preceding.Builder offsetPrecedingBuilder = | ||
Expression.WindowFunction.Bound.Preceding.newBuilder(); | ||
offsetPrecedingBuilder.setOffset(0 - offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need -offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. In fact, this branch is the existing logic
|
||
// the reference to pre-project range frame boundary. | ||
Expression ref = 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you document this change to Substrait in https://github.com/apache/incubator-gluten/blob/main/docs/developers/SubstraitModifications.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
7d570fb
to
d5195af
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
c2b472f
to
7947976
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great work! Just posted some comments. Please check whether it makes sense.
throw new UnsupportedOperationException( | ||
"Unsupported Window Function Frame Type:" + boundType); | ||
} | ||
} else if (boundType.foldable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if branch is just for ClickHouse backend? It would be nice to leave some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clickhouse backend and velox backend with ROW frame. Comments added
} | ||
} catch (NumberFormatException e) { | ||
throw new UnsupportedOperationException( | ||
"Unsupported Window Function Frame Type:" + boundType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add statements to catch this exception? I think Spark parser has already checked the validity of foldable bound (e.g., 100a is not valid). The catching looks unnecessary.
case swf: SpecifiedWindowFrame if needPreComputeRangeFrame(swf) => | ||
// This is guaranteed by Spark, but we still check it here | ||
if (orderSpecs.size != 1) { | ||
throw new GlutenNotSupportException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use GlutenException
. Generally GlutenNotSupportException
is caught in fallback handling for some limitation cases. But for this case, I think we can directly let a runtime exception happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return std::make_tuple( | ||
core::WindowNode::BoundType::kFollowing, | ||
std::make_shared<core::ConstantTypedExpr>(BIGINT(), variant(boundType.following().offset()))); | ||
specifiedBound(following.has_offset(), following.offset(), following.ref())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can has_offset
be true for Velox backend after the java side handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reused both for ROW and RANGE frame, so offset is used for ROW frame here.
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments. We can merge it after CI is fixed. Thanks!
offsetFollowingBuilder.setOffset(offset); | ||
builder.setFollowing(offsetFollowingBuilder.build()); | ||
} | ||
} catch (NumberFormatException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we can also remove this try/catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} catch (NumberFormatException e) { | ||
} else { | ||
throw new UnsupportedOperationException( | ||
"Unsupported Window Function Frame Type:" + boundType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not frame type, but bound type, please help fix it in your pr.
"Unsupported Window Function Frame Bound Type: " + boundType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Gluten Clickhouse CI |
@WangGuangxin, the pr looks good to me. Could you fix the CI failure? Looks relevant to this pr. |
Run Gluten Clickhouse CI |
@WangGuangxin, please also rebase the code. Thanks! |
8e87f6e
to
87a4692
Compare
Run Gluten Clickhouse CI |
val a = expressionMap | ||
.getOrElseUpdate( | ||
bound.canonicalized, | ||
Alias(Add(orderSpec.child, bound), generatePreAliasName)()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really support reuse the reference for the same bound literal ? What happens if the two window expressions have the same frame but requires different ordering column.
sum(x) OVER (
PARTITION BY c1
ORDER BY c2 RANGE BETWEEN 1 preceding AND CURRENT ROW
),
sum(x) OVER (
PARTITION BY c1
ORDER BY c3 RANGE BETWEEN 1 preceding AND CURRENT ROW
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really support reuse the reference for the same bound literal ? What happens if the two window expressions have the same frame but requires different ordering column.
sum(x) OVER ( PARTITION BY c1 ORDER BY c2 RANGE BETWEEN 1 preceding AND CURRENT ROW ), sum(x) OVER ( PARTITION BY c1 ORDER BY c3 RANGE BETWEEN 1 preceding AND CURRENT ROW )
let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frame bouding is a reference, which will be pre-project before window operator. And the sort/partition are calculated in window operator. So it's ok to reuse it.
I add a UT for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new added UT succeed in spark3.2, but failed in spark3.4/3.5. I'll fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response, all UT passed now, the failed one seems unrelated. Please help review again @PHILO-HE @ulysses-you
The reason why it was failed is because sort column l_discount
is a decimal, and when we do Add(sort_col, offset)
, the precision of decimal type will change, so that the data type check failed in Spark (https://github.com/apache/spark/blob/82a84ede6a47232fe3af86672ceea97f703b3e8a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L83)
In fact, we only support ByteType | ShortType | IntegerType | LongType | DateType
by design, which is checked in https://github.com/apache/incubator-gluten/pull/5626/files#diff-68925b6a5ee67a942a976971ae6727a45a0b166d200d8d983454169e7c3bd1f6R316, but even when WindowExec fallback by this check, the PullOutPreProjection
rule will still be executed.
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
2680b9f
to
e726284
Compare
Run Gluten Clickhouse CI |
e726284
to
339751a
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @WangGuangxin
What changes were proposed in this pull request?
Support window range frame by pre-project frame offset for non-SpecialFrameBoundary, based on the discussion facebookincubator/velox#7557
(Fixes: #5625)
How was this patch tested?
UT